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Introduction 



Empirical Likelihood (EL), introduced by Owen ( 19881 ) . is a powerful semi-parametric 
method. It can be used in a very general setting and leads to effective estimation, tests 
and confidence intervals. This method shares many good properties with the conven- 
tional parametric log-likelihood ratio: both statistics have limiting distribution and 
are Bartlett correctable, meaning that the error can be reduced from 0{n~^) to 0{n~'^) 
by a simple adjustment. An additional property of EL is that the corresponding confi- 
dence intervals and tests do not rely on an estimator of the variance. This last property 
is specially noticeable for dependent data, since estimating the variance is then a chal- 
lenging issue. 

Owen's framework has been intensively studied in the 90's (see OwenI 2001 . for an 
overview), leading to many generalizations and applications, but mainly for an i.i.d. set- 
ting. Some adaptations of EL fo r dependent data have been introduced, such as the Block 
Empirical Likelihood (BEL) of iKitamural (|l997l ) for weakly depe i ident processes or the 
subject-wise and elementwise empirical likelihoods of Wang et al. ( 2OI0I ) for longitudinal 
data. BEL is inspired by similarities with the bootstrap methodology. Kitamura proposed 
to apply the empirical likelihood framework not directly on the data but on blocks of 
consecutive data, to catch the dependence structure. This idea, known as Block Bootstrap 
(BB) or blocking technique (in t he probabilisti c literature, see Doukhan and Ango Nzd 
20041 . for references) goes back to lKunschI (Il989l'l i n the bootstrap literature and has been 
intensively exploited in this field (see LahiriT2003, fo r a sur vey). However, the BB perfor- 
mance has been questioned, see lGotze and Kunsch (Il99fil ;i and lHorowitj (|200.'^ ). Indeed 
it is known that the blocking technique distorts the dependence structure of the data 
generating process and its performance strongly relies on the choice of the block size. 
From a theoretical point of view, the assumptions used to prove the validity of BB and 
BEL are generally strong: it is assumed that the process is stationary and satisfies some 
str ong-mixing p roper ties (some non-stationary processes can nevertheless be handled, 
Synowieckil ( 200?! ) for example). In addition to having a precise control of the cov- 



see 



erage probability of the confidence intervals , we have to assume that th e strong mixing 
coefficients are exponentially decreasing (see Kitamura 1997 : Lahiri 20031 ). Moreover, the 
choice of the tuning parameter (the block size) may be quite difficult from a practical 
point of view. 

In this paper, we focus on generalizing empirical likelihood to Markov chains. Ques- 
tioning the restriction implied by the Markovian setting is a natural issue. It should 
be mentioned that homogeneous Markov chain models cover a huge number of time se- 
ries models. In particular, a Markov chain can always be written in a nonparametric 
way: Xi = /i(Xj_i,--- ,Xj_p,ej), w here (£i)i>n is i.i .d. with density / and, for i > 0, 
£i is independent of (^fc)o<fc<j (see Kallenberal 20021 ). Note that both h and / are un- 
known functions. Such representations explain why, provided th at p is l a ,rge e nough, 
any process of length n can be generated by a Markov chain, see Knight ( 19751 ). Note 
also that a Markov chain may not be necessarily strong-mixing. For instance, the sim- 
ple linear r uodel Xj = ^(X,;_i + £?:) with P(ej = 1) = P(ej = 0) = | is not strong 



mixing (see Doukhan a nd An go Nzd 2004 . for results on dependence in Econometrics). 
Doukhan and Ango Nzq f)2004 ) gives many classical econometric models that can be seen 
as Markovian: ARMA, ARCH and GARCH processes, bilinear and threshold models. 

Our approach is also inspired by some recent developments in the bootstrap literature 
on Markov chains: instead of choosing blocks of constant length, we use the Markov 
chain structure to choose some adequate cutting times and then we obtain blocks of 
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various lengths. This construction, introduced in Bertail and Clemencon ( 20041 ) . catches 
the dependence structure. It is originally based on the existence of ai i atom for the chain 
i.e. a n accessible set on which the transition kernel is constant (see iMeyn and Tweedie 
2OO9I . chapter 1.5). The existence of an atom allows us to cut the chain into regeneration 



blocks, separated from each other by a visit to the atom. These blocks (of random 
lengths) are independent by the strong Markov property. Once these blocks are obtained, 
the Regenerative Block- Bootstrap (RBB) consists in resampling the data blocks to build 
new regenerative processes. The rate of convergence of the pivotal statistic obtained 
by resampling these blocks (©(n"^"*"*")) is better than the one obtained for the Block 
Bo otstrap {0{n~^/^)) and is close to the c lassic al rate 0{n~^) obtained in the i.i.d. case. 



see 



Gotze and KunschI (|l996l ^ andlLahid (|2003l ). 



These improvements suggest that a version of the empirical likelihood (EL) method 
b ased on suc h block s could yield improved results in comparison to the method presented 
in Kitamura Indeed it is known that EL enjoys somehow the same properties in 



terms of accuracy as the bootstrap but without any Monte-Carlo step. The main idea is 
to consider the renewal blocks as independent observations and to follow the empirical 
likelihood method. Such a program is made possible by transforming the original problem 
based on moments under the stationary distribution into an equivalent problem under 
the distribution of the observable blocks (via Kac's Theorem). The advantages of the 
method proposed in this paper are at least twofold: first the construction of the blocks 
is automatic and entirely determined by the data: it leads to a unique version of the 
empirical likelihood program. Second there is not need to ensure stationarity nor any 
strong mixing condition to obtain a better coverage probability for the corresponding 
confidence regions. 

Assuming that the chain is atomic is a strong restriction of this method. This hypoth- 
esis holds for discrete Markov chains and queuing (or stora ge) systems returning to a 
stable state (for instance the empty queue): see chapter 2.4 of Meyn and Tweed"i3 ( 20091 ). 
However this method can be extended to the more general case of Harris chains. Indeed, 
any chain having some recurrent properties can be extended to a chain possessing an 
atom which then enjoys some regenerative properties. Nummel in gives an explicit con- 
struction of suc h an extension that we reca l l in S ection |4] (see Athreya and Neyl 19781 : 
Nummelin 19781 ). In Bertail and Clemencon (l200fil ). an extension of the RBB procedure 



to general Harris chains based on the Nummelin' splitting technique is proposed {the Ap- 
proximate Regenerative Block- Bootstrap, ARBB). One purpose of this paper is to prove 
that these approximatively regenerative blocks can also be used in the framework of 
empirical likelihood and lead to consistent results. 

The outline of the paper is the following. In Section [21 notations are set out and key 
concepts of the Markov atomic chain theory are recalled. In Section [3l we present how to 
construct regenerative data blocks and confidence regions based on these blocks. We give 
the main properties of the corresponding asymptotic statistics. In Section|3]the Nummelin 
splitting technique is shortly recalled and a framework to adapt the regenerative empirical 
likelihood method to general Harris chains is proposed. We essentially obtain consistent 
results but also briefly discuss test and higher order properties. In Section [5l we present 
some moderate sample size simulations. 
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2. Preliminary statement 



2.1. Framework 



For the sake of simplicity, we use the same notations as Bertail and Clemencon ( 20041 ) 
when possible. We consider a chain X = (Xi)-^^ on a state space {E,£), with initial 
distribution v and transition probability H. For a set B £ £ and i S N, we thus denote 

Xo ~ and ¥{Xi e B \ Xq, Xi^i) = U{Xi^i, B) a.s. . 

Recurrence properties will be important in the following. An irreducible chain is said 
positive recurrent when it admits an invariant probability : 

3^ probability measure on E, /ill = /i, where /xn(-) = / ^{dx)Il (x, •) . 

Jx&E 

We assume that the chain is aperiodic (i.e. X is not cyclic) and that there exists a 
measure ij) such as the chain is ^/'-irreducible. This simply means that for any starting 
state X m. £ and any set A of positive V'- measure, the chain visits A with probability 1. 
A ^/^-irreducible chain is said Harris recurrent if every measurable set with positive tp- 
measure visited once is visited infinitely often with probability 1. 

In what follows, P,^ and P^^ (for x in E) denote respectively the probability measure 
when Xq ~ v and Xq = x. The indicator function of an event A is denoted by 1^. The 
corresponding expectations are denoted ]E,y(-), E.rf- ] and E /)[•]■ For further details and 



tradit ional properties of Markov chains, we refer to Revuz ( 19841 ) or Meyn and Tweedie 
(l2009lV 



Notice that the chain X is not supposed stationary (since v may differ from ^) nor 
strong-mixing. To simplify the exposition, we do not treat in this paper the fully non- 
st ationary ca s e corr esponding to null recurrence. Results in that direction may be found 
in lTiosthein] (|l99nl ). 



2.2. Atomic Markov chains 

Assume that the chain is ^/'-irreducible and possesses an accessible atom, i.e. a set A with 
'4^{A) > such that n(x, .) = n(y, .) for all x, y in A. The class of atomic Markov chains 
contains not only chains defined on a countable state space bu t also many spec ific Markov 



models used to study queuing systems and stock models (see lAsmussenll2003l . for models 
involved in queuing theory). In the discrete case, any recurrent state is an accessible 
atom: the choice of the atom is thus left to the statistician who can for instance use the 
most visited point. In many other situations the atom is determined by the structure of 
the model (for a random walk on M"*", with continuous increment, is the only possible 
atom) . 

Denote by ta = Tj^i^) = infj/c > 1, X^ € A} the hitting time of the atom A (the 
first visit) and, for j > 2, denote by TA{j) = mi{k > taH — 1), X^ G ^4} the successive 
return times to A. The sequence (T^(i))j>i defines the successive times at which the 
chain forgets its past, called regeneration times. Indeed, the transition probability being 
constant on the atom, Xr^+i only depends on the information that X^-^ is in A and not 
any more on the actual value of X^-^ itself. 

For any initial distribution the sample path of the chain may be divided into blocks 
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of random length corresponding to consecutive visits to A: 



The sequence of blocks (-Bj)i<j<oo is then i.i.d. by the strong Markov property 
(|Mevn and Tweediel [2nn9l . page 73). Notice that the block Bq = (Xi,..., Xr^) IS inde- 
pendent of the other blocks, but does not have the same distribution, because it depends 
on the initial distribution z^. 

Let m : E X MP ^ W he a measurable function and be the true value of some 
parameter G of the chain, given by an estimating equation on the invariant measure 



E^[miX,eo)]=0. 



il) 



Dimensions are of importance: the number of constraints r must be at least equal to 
the number of parameters p, for identification reasons. The estimation of the mean is a 

just-identified case (r = p): 9q = IE^[^] and m{X, 9) = X — 9. 

In this framework, Kac's Theorem, stated below ( Meyn and Tweedie 20091 . Theo- 
rem 10.2.2) allows us to write functionals of the stationary distribution /i as functionals 
of the distribution of a regenerative block. 

Theorem 2.1 Kac Let X be an aperiodic, ^l' -irreducible Markov chain with an accessible 
atom A. X is positive recurrent if and only ifE,A[TA] < oo. In such a case, X admits an 
unique invariant probability distribution /j,, the Pitman's occupation measure given by 



In the following we denote 



Ta 

.1=1 



/^a[ta], for allFGS. 



taU+I) 

MiBj,9)= Yl "^(^i'^) 

i=TA{j) + i 

SO that we can rewrite the estimating equation ([T]) as: 

EA[M{Bj,9o)]=0. 



(2) 



Kac's Theorem allows us to use the decomposition of the chain into independent blo cks 
to obtain limit theorems for atomic chains. See for example iMeyn and Tweediel (|2009l ) for 
the Law of Large Numbers (LLN, pag e 415), Central Li mit Theorem (CLT, page 416), 



Law of Iterated Logarithm (page 4 16), iBolthausenl ( 1982 ) for the Berry-Esseen Theorem 
and Bertail and ClemenconI ( 20041 ) for Edgeworth expansions. These results are estab- 



lished under hypotheses related to the distribution of the Bj^s : 
Return time conditions: 



HO(k) : Ea[t%] < oo, 
HO(k, u) : E^[t'^] < oo, 
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where k > and is the initial distribution of the chain. When the chain is stationary 
and strong mixing, t hese hypoth e ses ca n be related to the rate of decay of a-mixing 
coefficients a{p), see Bolthausen ( 19821 ) . In particular, the hypotheses are satisfied if 

Ei>ii''a(i) < oo. 

Block-moment conditions: 



H1(k;, m) : Ea 
H1(k, v, m) : E^ 




< oo. 



< oo. 



The assumptions on u allow to control the first block Bq. E quivalence of these assump - 
tions with easily checkable drift conditions may be found in lMevn and Tweedid ( 20091 ). 
Appendix A. 



3. The regenerative case 

3.1. Regenerative Block Empirical Likelihood algorithm 

Let Xi, • • • , Xn be an observation of the chain X. If we assume that we know an atom 
A for the chain, the construction of the regenerative blocks is then trivial. Consider the 
empirical distribution of the blocks: 



where is the number of complete regenerative blocks, and the multinomial distributions 



qj^Bj, with < qj and qj = 1, 



dominated by P;^. To obtain a confidence region, we will applv lOwenl (ll990l Vs method 
to the blocks Bj-. we are going to minimize the Kullback discrepancy between Q and P;^ 
under the condition ([2]). More precisely, the Regenerative Block Empirical Likelihood is 
defined in the next 4 steps: 

Algorithm 1 ReBEL - Regenerative Block Empirical Likelihood construction: 

(1) Count the number of visits to A up to time n: + 1 = XliLi ^x,eA- 

(2) Divide the observed trajectory X^"'^ = {Xi, X„) into /„+2 blocks corresponding 
to the pieces of the sample path between consecutive visits to the atom A, 



Bn 



B, 



{Xi, X. 



B, 



(-^rA(i„)+l' ^taH^+I))-: -^lr+1 ~ (^^a(Z„ + 1)+1' ^n)-, 
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with the convention B, 



in) 



n. 



(3) Drop the first block Bq and the last one (possibly empty 
when TA{ln + 1) = n). 

(4) Evaluate the empirical log- likelihood ratio r„(0) (practically on a grid of the set 
of interest): 



rniO) 



sup { log 



n ^"^j 



L In. 

J2qj-M{Bj,e) = 0, 5]g, = l 



Using Lagrange arguments, this can be more easily calculated as 



rn{e) = sup <^ Vlog [1 + X'M{B,,e)] 



Remark 1 Small samples: Possibly, if the chain does not visit A, In = —1. Of course 
the algorithm cannot be implemented and no confidence interval can be built. Actually, 
even when /„ > 0, the algorithm can be meaningless and at least a reasonable number 
of blocks are needed to build a confidence interval. In the positive recurrent case, it is 
known that In ~ n/E^[ryi] a.s. and the length of each block has expectation E^[ryi]. Many 
regenerations of the chain should then be observed as soon as n is significantly larger 
than E/i[r^]. Of course, the next results are asymptotic, for fi nite sample conside ration 
on empirical likelihood methods (in the i.i.d. setting), refer to Bertail et al. ( 20081 ). 



The next theorem states the asymptotic validity of ReBEL in the case r = p (just- 
identified case). For this, we introduce the ReBEL confidence region defined as follows: 



Cn,a = G I 2 • rn{e) < {I - «) } 



where F^a is the distribution function of a distribution with p degrees of freedom. 
Remark 2 Scaling factor: Block empirical likelihood methods usually need a scaling 



factor to compensate the eventual overlap of the blocks, denoted in Kitamura ( 19971 ). 
page 2089. The regenerative perspective used here forbid any overlap and therefore avoid 
such a factor. 



Theorem 3.1 : Let fi be the invariant measure of the chain, let 9q G 
eter of interest, satisfying E^[m(X, ^o)] = 0. Assume 

S = EA[rA]-^E^[M(B,^o)Af(S,0o)'] 
is of full-rank. If HO{l,u), H0{2) and HI {2, m) hold, then 

2rn{eo) xl 



be the param- 



and therefore 



1 



a. 
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The proof relies on the same arguments as the one for empirical likelihood based on 
i .i.d. data. This ca n be easily understood: our data, the regenerative blocks, are i.i.d. 
(IOwenlll99dl2nnih . The only difference with the classical use of empirical likelihood is 
that the length of the data (i.e . the number of blocks) is a random value In- However, we 
have that n//„ — Ea{ta) a.s. (jMeyn and Tweediell2009l . page 425). The proof is given in 
the appendix. 

Remark 3 Convergence rate: Let's make some ve r y bri ef discussion on the rate of 
convergence of this method. Bertail and Clemengon ( 20041 ) shows that the Edgeworth 
expansion of the mean standardized by the empirical variance holds up to Oiy{n~^) (in 
contrast to what is expected when considering a variance built on fixed length blocks). 
It follows from their result that 



P. (2r„(0o) < u) 



F-}{u) + 0,{n-^) 



This is already (without Bartlett correction) b etter than the Ba rtlett corrected empirical 
likelihood when fixed length blocks are used (jKitamural 119971 ). Actually, we expect, in 
this atomic framework, that a Bartlett correction would lead to the same result as in the 
i.i.d. case: 0{n~'^). However, to prove this conjecture, we should establish an Edgeworth 
expansion for the likelihood ratio (which can be derived from the Edgeworth expansion 
for self-normalized sums) up to order 0(n~^) which is a very technical task. This is left 
for further work. 

Remark 4 Change of discrepancy: Empirical likelihood can be seen as a contrast 
method based on the Kullback discrepancy. To replace the Kullback discrepancy by some 
other discrepancy is an interest ing problem which has led to some recent works in the i.i.d. 
case. iNewey and SmithI (120041) generalized empirical likelih ood to the family of Cressie- 
Read discrepancies (see also Guggenberger and Smith 20051 ) . The resulting methodology. 



Generalized Empirical Like l ihood , is includ ed in the einpirical gg-disc repancy method 
introduced bv iBertail eFall (l2008l ) (see also lBertail et al.ll2007l : iKitamur a 2006). 

In the dependent case, it should be mentioned that the constant lengt h blocks proce- 
dure h as been studied in the case of empirical Euclidean likelihood by iLin and Zhangj 
(|200ll ). A method b ased on the Cressie-R ead discrepancies for tilting time series data has 
been introduced by Hall and Yaol ( 20031 ). Our proposal, stated here for the Kullback dis- 
crepancy only, is straightforwardly compatible with these generalizations (Cressie-Read 
and (/^-discrepancy) . 

An important issue is the behavior of the empirical log-likelihood ratio under a local 
alternative, i.e. if the moment equation ([T]) is misspecified : K^[m{X,9o)] = bj^/n. The 
result states as follows. 

Theorem 3.2 : Let [i he the invariant measure of the chain, let 6q G be the pa- 
rameter of interest, satisfying E^j[m(X, ^o)] = S/y/n. Assume that T, is of full-rank. 
If HO{l,u), H0{2) and Hl(2,m) hold, then the empirical log-likelihood ratio has an 
asymptotic noncentral chi-square distribution with p degrees of freedom and noncentral- 
ity parameter d'T^^^S 



2r. 



-^Xp(5's-i<^). 



The proof is postponed to the appendix. It is a classical result that the log-likelihood 
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ratio is asymptotically noncentral chi-square and that the critical order is The 
interesting quanti ty to study th e efficiency of the method in this context is the noncen- 
trality parameter. Newey ( 19851 ) gives the asymptotic distribution of the pivotal statistic 
based on optimally weighted GMM which is a standard tool for dependent data. Unfor- 
tunately, Newey's results are stated in a parametric context and it is therefore impossible 
to compare them with Theorem 13. 2[ 

Nevertheless, ReBEL can easily be compared with the Continuously updated GMM 
(CUE-GMM) which is very close to the optimally weighted GMM. CUE-G MM estimators 



have b een shown to coincide with empirical Euclidean likelihood (EEL), see lAntoine et al. 



(|2007l l. The difference between the EL and EEL being just a change of discrepancy (see 
Note [4|), it is then straightforward to adapt the proof of Theorem 13.21 to the case of 
the EEL. The developments of the pivotal statistics coincide for the two first order and 
therefore they lead to the same asymptotic distribution in the case of misspecification. 
EL is thus as efficient as the optimally weighted GMM. 



3.2. Estimation and the over-identified case 

The properties of empirical likelihood proved by Qin and Lawless ( 19941 ) can be extended 



to our Markovian setting. In order to state the corresponding results respectively on 
estimation, confidence region under over-identification (r > p) and hypotheses testing, we 
introduce the following additional assumptions. Assume that there exists a neighborhood 
V of ^0 and a real positive function with [A^(X)] < oo, such that: 

H2(a) dm(x,0)/d6 is continuous in 6 and bounded in norm by N{x) for 6 in V, 

H2(b) D = E^[dm{X,eo)/d9] is of full rank, 

H2(c) d'^m{x, 9)/ 8989' is continuous in 9 and bounded in norm by N{x) for 9 in V, 

H2(d) \\m{x,9)\\^ is bounded by N{x) on V. 

Notice that H2(d) implies in particular the block moment condition Hl(3,m) since 
by Kac's Theorem 

Empirical likelihood provides a natural way to estimate in the i.i.d. case 
( Qin and Lawless 19941 ). This can be straightforwardly extended to Markov chains. The 



estimator is the maximum empirical likelihood estimator defined by 

9n = arg inf {r„(6')}. 

The next theorem shows that, under natural assumptions on m and /x, 9n is an asymp- 
totically Gaussian estimator of ^o- 

Theorem 3.3 : Assume that the hypotheses of Theorem \3.1\ holds. Under the addi- 
tional assumptions H2(a), H2(h) and H2(d), 9n is a consistent estimator of 9q. If in 
addition H2(c) holds, then 9n is asymptotically Gaussian: 



V^i9n - 9o) M fo, {D'T.-^D) . 

n— >oo \ ^ 'J 
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Notice that both D and S can be easily estimated by empirical sums over the blocks. 
The corresponding estimator for (Z?'$]~^D) is straightforwardly convergent by the 
LLN for Markov chains. 

Remark 5 Asymptotic covariance matrix : Our asymptotic covariance matrix 
(D'Tj^ ^D^ ^ is to be compared with the asymptotic covariance matrix Ve of Kitamura 



!|l997l )'s estimator, which coincide with the asymptotic covariance matrix of the opti- 
mally weighted GMM estimator. Both matrix are very similar: Vq = {D'S^^ D)~^ , where 
S is the counterpart of our S for weakly dependent processes: 



S = lim n"i Yl "^(^*' ^o) \Y1 "^(^*' ^o) 



For a process being both weakly dependent and Markovian (and in particular in the i.i.d. 
case), 5 = S and therefore = (D'S"^!)) 

The case of over-identification (r > p) is an important feature, specially for econometric 
applications. In such a case, the statistic 2rn{0n) may be considered to test the moment 
equation ([1]): 

Theorem 3.4 : Under the assumptions of Theorem \3.3\ if the moment equation (OP 
holds, then we have 



2rJ 



C 2 



We now turn to a theorem equivalent to Theorem 13. 1[ In the over-identified case, the 
likelihood ratio statistic used to test 6 = 6q must be corrected. We now define 

Wi,n{e) = 2r„(^) - 2r„(6*„). 



The ReBEL confidence region of nominal level 1 — a in the over-identified case is now 
given by 



C^^ = [9gMP \Wi,ni9) < F-}{1 -a)]. 



Theorem 3.5 : Under the assumptions of Theorem the likelihood ratio statistic 
for 6 = 9q is asymptotically Xp-' 



^4 



and „ is then an asymptotic confidence region of nominal level 1 — a. 



likelihood ratio ( 


Gueeenbereer and Smith 


2OO5I; 


Kitamura 


1997; 


Kitamura et al. 


2004; 


Qin and Lawless! 


1994|). Let 9 = (7,/3)' be in Ri x W-i, where 7 e M'' is the parameter 



of interest and fi € MP is a nuisance parameter. Assume that the true value of the 
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parameter of interest is 70. The empirical likelihood ratio statistic in this case becomes 

W2,n{l)=2- (^infr„((7,/3)')-infr„(0)^ =2 - (^infr„((7, /?)')- r„(^n)) , 
and the empirical likelihood confidence region is given by 

Cla = {7 e M'? \W2A1) < F^H^ - «) } • 

Theorem 3.6: Under the assumptions of Theorem \3.3l 

n— >oo ^ 

and ^ is then an asymptotic confidence region of nominal level 1 — a. 

4. The case of general Harris chains 
4.1. Algorithm 

As explained in the introduction, the splitting technique introduced in iNummelin (1 19781 ^ 



allows us to extend our algorithm to general Harris recurrent chains. The idea is to extend 
the original chain to a "virtual" chain with an atom. The splitting technique relies on 
the crucial notion of small set. Additional definitions are needed: a set S £ £ is said to 
be small if there exist 5 > 0, a positive integer q and a probability measure # supported 
by S such that, for all x £ S, A £ £, 

W{x,A)>5^{A), (3) 

Yl'^ being the g-th iterate of the transitioi i probability H. Note th at an accessible small 
set always exists for ■(/'-irreducible chains ( Jain and Jamison 196?! ). 



In the case g > 1, a first step is typically to reduce the order to 1 by stacking lagged 
values (an example in given in section [5^2]) . Nevertheless, this complicates the exposition 
and the demonstrations since the resulting transition probability has no den sity and since 



the sp littin g technique leads to 1-dependence instead of independence. See lBertail et al. 



(|2009l ) and lAdamczakl (120081 ) on that issues. For simplicity, we assume in the following 



that (7 = 1 and that ^> has a density (j) with respect to some reference measure A(-). 
The idea to construct the split chain X = (X, W) is the following: 

• if Xj ^ S, generate (conditionally to Xi) Wi as a Bernoulli random value, with prob- 
ability 6. 

• if Xi £ S, generate (conditionally to Xi) Wi as a Bernoulli random value, with prob- 
ability 6(t){Xi+i)/p{Xi,Xi+i), 

where p is the transition density of the chain X with respect to A. This construction 
essentially relies on the fact that under the minorization condition 'n{x,A) may be 
written on 5 as a mixture: U{x,A) = (1 - 5){U{x,A) - 6^{A))/{1 - 6) + 6^{A), which 



January 12, 2013 15:22 Journal of Nonparametric Statistics main l5'0211 



12 



is c onstant (independent of the s tarting point x) when the second component is picked 
(see Bertail and Clemencon 20061 . for details). 

When constructed this way, the spht chain is an atomic Ma .rkov chain, with margina l 
distribution equal to the original distribution of X (see Meyn and Tweed"i3 20091 . 
page 427). The atom is then A = S x {1}. In practice, we will only need to know 
when the split chain visits the atom, i.e. we only need to simulate Wi when Xi £ S. 

Those visits to the atom are therefore the date of regeneration of the chain, and the 
number of visits acts as a sample size. In practice, the choice of the small set is then 
decisive for the performance of the algorithm. A balance needs to be achieved: if S were 
chosen too large, it would be visited very often, but the minorization condition ([3]) would 
likely be poor and therefore 6 would be small. This would lead to many realization Wi = 
and few Wi = 1. Most of the visits of Xi to the small set would then be wasted since 
they would not give a regeneration time. This balance is not a curse: it gives a natural 
data-driven tuning of the small set and prevent from the difficulties rising in the choice 
of kern el bandwidth for examp l e. For a discussion on the practical choice of the small 
set, see Bertail and Clemengon ( 20061 ). 

The return time conditions are now defined as uniform moment condition over the 
small set: 



H0(5, k) : supE^.[r|] < oo, 

x€S 

H0(5', K, zy) : E^[t^] < oo. 



The Block-moment conditions become 



HI (5, K, m) : supE^,. 



Hl(5, K, I/, m) : Ku 




< oo. 



< oo. 



Unfortunately, the Nummelin technique involves the transition density of the chain, 
which is of course unknown in a nonparametric approach. An approximation p„ of this 
density can however be computed easily by using standard kernel methods. This leads 
us to the following version of the empirical likelihood program. 

Algorithm 1 Approximate regenerative block EL construction: 

(1) Find an estimator p„ of the transition density (for instance a Nadaraya- Watson 
estimator). 

(2) Choose a small set S and a density on S and evaluate 6 = min^^ygg | ^^fa'f } • 

(3) When X visits S, generate Wi as a Bernoulli with parameter 
5(j){Xi^i)/pn{Xi, Xi+i). If Wi = 1, the approximate split chain {Xi,Wi) = Xi 
visits the atom A = S x {1} and i is an approximate regenerative time. These 
times define the approximate return times T^(j)- 

(4) Count the number of visits to A up to time n: ln + I = Y17=i eA' 

(5) Divide the observed trajectory X^") = {Xi, Ar„) into /n+2 blocks corresponding 
to the pieces of the sample path between approximate return times to the atom 
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A, 



with the convention = when TA{in + 1) = n. 

(6) Drop the first block Bq, and the last one B~^^ (possibly empty when 

TA{in + 1) = n). 

(7) Define 

taU+I) 

MiBj,e)= Yl "^(^-^)- 

i=TA(j) + l 

Evaluate the empirical log-likelihood ratio fn{0) (practically on a grid of the set 
of interest): 



rn{0) = - sup { log 

{?i,-,9i„) 



n ^'"^i 



X;g,.M(5,-,^) = o, x;gi = i^. 



Using Lagrange arguments, this can be more easily calculated as 

TniO) = sup < 



^log [l + A'M(S,-,e) 



> . 



4.2. Main theorem 

The practical use of this algorithm crucially relies on the preliminary computation of a 
consistent estimator of the transition density. We thus consider some conditions on the 
uniform consistency of the density estimator pn- These assumptions are satisfied for the 
usual kernel or wavelets estimators of the transition density. 

H3 For a sequence of nonnegative real numbers {an)nm converging to as n — >■ oo, 
p{x, y) is estimated by Pn{x., y) at the rate for the mean square error when error 
is measured by the L°° loss over S x S: 



sup \pn{x,x') -p{x,x') 

{x,y)GSxS 



(^u{(^n): as n — >■ oo. 



H4 The minorizing probability <I> is such that inf^^g^ 4>ix) > 0. 

H5 The densities p and pn are bounded over S'^ and infx^y^s Pn{x,y)/4>{y) > 0. 

Since the choice of <5 is left to the statistician, we can use for instance the uniform 
distribution overs S, even if it may not be optimal to do so. In such a case, H4 is 
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automatically satisfied. Similarly, it is not difficult to construct an estimator p„ satisfying 
the constraints of H5. 
Results of the previous section can then be extended to Harris chains: 

Theorem 4.1 : Let fj, be the invariant measure of the chain, and 6q G W he the 
parameter of interest, satisfying E^[m(X, ^o)] = 0. Consider A = S {1} an atom of 
the split chain, ta the hitting time of A and B = (Xi, • • • Assume the hypotheses 

H3, H4 and H5, and suppose that W,a[M{B,0q)M{B,6q)'] is of full rank. 

(a) If H0{S,4:,v) and H0{S,2) holds as well as Hl{S,A,v,m) and Hl{S,2,m), then 
we have in the just-identified case (r = p): 

2rM^^xl 

and therefore 

Cn,a ={eeW\2- fn{9) < F^.\l - q) } . 

is an asymptotic confidence region of level 1 — a. 

(b) Under the additional assumptions H2(a), H2(b) and H2(d), 

§ = arg inf {fn(6')} 

is a consistent estimator of 6q. If in addition H2(c) holds, then \/n{Q — 6q) is 
asymptotically normal. 

(c) In the case of over-identification (r > p), we have: 

Wi,n{Oo) = 2fnieo) - 2fn{e) Xp 

71— >00 

and 

= [0gW \Wi,r.i0) < F-.Hl -a)], 

is an asymptotic confidence region of level 1 — a. The moment equation HP can be 
tested by using the following convergence in law: 

2rn{e) > Xr-p- 

(d) Let = (7,/3)', where 7 G and 13 G M^"^. Under the hypotheses 7 = 70, 

l?2,n(7o) = 2inf r„((7o,/3)') - 2f„(^) 

and then 

= {7 e K'' \W2A1) < F^lil -a)], 
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is an asymptotic confidence region of level I — a for the parameter of interest 7. 



5. Some simulation results 



5.1. Illustrative example 



We introduce here an example i n order to illustr ate the method in a very simple setting 
and compare ReBEL with BEL (lKitamuralll997l ) in a situation that favor none. 



We consider a AR(1), which is also a Markov chain of order 1, defined as follows: 
Xo = 0, Si ~ U{[-VT2, VU]) and X, = 0.9X,_i + Si, 

£t being i.i.d. uniformly distributed random variables of mean and variance 1. Since 
0.9 < 1, the chain is recurrent. The parameter of interest is the mean of the chain, /i = 0. 

We choose in the simulations a small set 5 = [—0; o]. On each simulation, a is chosen 
from a small grid in order to maximize the number of regeneration times as explained 
in section 14. li ReBEL algorithm is then implemented and coverage probabilities are 
calculated for the nominative level 95%. 

We also compute BEL coverage probabilities, using non-overlapping blocks of constant 
length the integer part of n^/^, where n is the data set length. 

We get the following results, for 10 000 replications: 



table IA.5I should approximatively here 

On these simulations, ReBEL seems to be better fitted. This may be due to the fact 
that ReBEL' small set length is data driven whereas BEL's blocks length is constant over 
the replications. In the following section, we set the small set once for all the replication. 



5.2. Estimation of the threshold crossing rate of a TGARCH 

The aim of this section is to show that ReBEL can be adapted to complex data and can 
outperform competing methods. Sor ne application s of em pirical likelihood to dependent 
data h ave been carr ied out, such as iLi and Wan3 ^200± on Stanford Heart Transplant 
data or lOwenI (l200lh on bristlecone pine tree rings. In his book, Owen motivates his use 
of empirical likelihood to study the tree rings data set by its asymmetry: "we could not 



capture such asymmetry in an AR model with normally distributed errors" (jQwenll2001 
page 168). 

To motivate the use of empirical likelihood, we propose here to generate data 
sets with strong asymmetry properties to illustrate the applicability of the method. 
For this, we consider a family of models i ntroduced to study financial data, the 
TGARCH ( Rabemananjara and Zakoian 19931 ). This model has been designed to han- 
dle non symmetric data, such as stock return series in presence of asymmetry in the 
volatility. We think in particular to applications on modeling electricity prices series 
( Cornec and Harari-Kermadec 20081 ). These series are very hard to model because of 
their very asymmetric behavior and because of the presence of very sharp peaks alter- 
nating with p eriods of low volatility. Application o f ReBEL to these series seems to be 
promising, see lCornec and Harari-Kermadec (|2008l l. 
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The data generating process is the fohowing: 

' Xi = 0.97Xi_i + Ei with Xo = 0, 

Ei = aiUi with Ui ~ NID(0, 1), 

fjj = 1 + 0.5|ei_i| + 0.4e+_^ with eo = 0. 

where the Vi are standard normal random values independent of all other random 
variables and is the positive part of x: x^ = max{0, x}. Of course, in the following, 
this generating mechanism is considered unknown. Retrieving the underlying mechanism 
by just looking at the data is a difficult task and this motivates the use of a non parametric 
approach in this context. 

It is straightforward that {X, e) is a Markov chain of order 1. As ej_i = 0.97Xj_2, 
it is immediate that X is a Markov chain of order 2. ReBEL algorithm can then be applied 
to Xj = (Xi, Xi-i), which is a Markov chain of order 1. 

In practice, the order k of the Markov chain is unknown and is therefore to be estimated. 
We propose the following heuristic procedure to estimate the order: 

(1) Suppose k = 1. 

(2) Build the block according to Algorithm [TJ 

(3) Evaluate the moment condition over the blocks: Yj = M{Bj,9). 

(4) Perform a test of independence (or at least of non correlation) of the (Yi, . . . ,Yi^_i), 
for example by testing the nullity of p given by Yi = pYi-i + fj. Other tests may be 
considered as well, such as tests based on kernel estimators of the density. 

(5) If the independence (or non correlation) is rejected, set k = k + 1 and restart at point 
2. 



In order to apply Kitamura ( 1997l )'s Block Empirical Likelihood (BEL), X must be 



weakly dependent. As the sum of the coefficients of |ei-i| and ef_^ is smaller than 1, the 
volatility of the data generating process is contracting. Therefore one can easily check 
the weak dependence of the process. 



5.3. Confidence intervals 

We are interested in estimating the probability of crossing a high threshold. This is an 
interesting problem because of the asymmetry of the data and a problem of practical 
interest for electricity prices. Indeed, production means are only profitable above some 
level. The probability of crossing the profitability threshold is therefore essential to esti- 
mate. The parameter of interest is defined here as: 

00 = [l{x,>io}] = P/. {Xi > 10) . 

and its value (estimated on a simulated data set of size 10^) is 9o = 0.1479. A first 
advantage of ReBEL is that such a parameter, defined with respect to the underlying 
invariant measure /U, is naturally handled by this method, whereas no unbiased estimating 
equation is available for BEL. 

We simulate a data set of length 1000 and perform a test to estimate the order of the 
chain. We build an estimator pn of the transition density p based on Gaussian kernels. 
The hypothesis (7 = 1 is rejected whereas q = 2 is not. As the chain is then consid- 
ered 2-dimensional, we consider a small set of the form S"^ where S is an interval. The 
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interval S has been chosen empirically to maximize the number of blocks and is equal 
to [—1.3; 4.7]. It is set once and for all and is not updated at each replication. On the 
graphic corresponding to one simulation, X is in the small set S'^ when the trajectory of 
X is in between the 2 plain black lines y = —1.3 and y = 4.7 for two consecutive times. 
For i such that Xj visits S^, we generate a Bernoulli Bi as in Algorithm [Tl and if i?j = 1, 
i is a approximate renewal time. On the simulation, S is visited 231 times, leading to 18 
renewal times, marked by a vertical green line. 

Figure ED should be approximately here 

The block length adapts to the local behavior of the chain: regions of low volatility lead 
to small blocks (between 500 and 700) whereas regions with high values lead to larger 
blocks (like the 142-484 block). It can be noticed that high values concentrate in few 
blocks, because the dependence is well captured by Algorithm [H BEL procedure leads 
to constant length b lock which cannot adapt to the dependence structure. As suggested 
bv iHall et all (|l995l ^. the BEL blocks used in the following are of length n^/^ = 10 and 



then the chain is divided into 100 non overlapping blocks. The overlapping block perform 
poorly and won't be considered in the following. 

Now that we have ReBEL approximately regenerative blocks, we can apply Theo- 
rem I4.ir a) to obtain a confidence interval for 9. We give a BEL confidence interval as 
well for comparison. We also consider two simpler methods as references for the perfor- 
mances of ReBEL: the simple sample mean mean = ll{j\:i>io} ^'^d the mean over 
the regenerative blocks 



trunc 



Eln ^^-^•ffc+l -fl S^ln V— vffc+1 -n 

k=l l^i=T{k)+l HX,>W} _ 2^k=l 2^i=r{k)+l Hx,>io} 

Efc=i(-r(A; + l) -f(A:)) ~ f(/„ + 1) - f (1)) 



The simple mean do not deal with the dependence and we expect it to perform poorly. 
The second reference method trunc uses the splitting technique in its expression, but in 
practice it only differs from mean by the fact that it discards the first and last blocks. 

An important point here is that to build confidence intervals with these two methods, 
an estimator of the variance is needed. In fact, if these estimators seem much simpler than 
BEL and ReBEL, the difficulty is mainly transferred to the estimation of their variances. 
This issue is difficult in a general dependence setting. In the appli cations, we used a 



boots trap estimator of the variance of mean and trunc according to lGotze and Kunsch 

Having in mind that difficulty, it is important to stress that ReBEL and BEL confidence 
intervals do not rely on an estimation of the variance of the estimator. This property 
is well-known for methods based on empirical likelihood that automatically estimate 
a variance at each point of the co nfidence interval, see for example the Continuously 



updated GMM lAntoine et all (120071 1. Addit ional results on the s elf-normalized properties 



of these methods have been investigated in Bertail et al. ( 20081 ). 



Figure ES] should be approximately here 

Mean and BEL estimators and confidence intervals appear biased to the right. This is 
most likely due to the effect of data from the first and last blocks, discarded by ReBEL 
and trunc. It can be noticed that BEL blocks being more numerous, the confidence 
interval is tighter for BEL than for ReBEL. 
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To compare the considered methods, we also compute coverage probabilities and type-II 
errors (which is equivalent to power in terms of test) of confidence intervals with nominal 
level 95%. To test the behavior under the alternative, we evaluate the statistics at the 
erroneous points 6 = Oq + hj^/n and = 6*0 + Wij \fn and check if the null hypotheses if 
rejected or not. 

The 2 000 simulation results are summarized in Table [A2| for n =1000, 5000 and 10000. 

Table IA2I should be approximately here 

Globally, ReBEL's coverage probabilities are better than BEL's, whereas its type-II error 
are bigger. This is coherent with Figure IA2t ReBEL confidence interval leads to better 
coverage probabilities but is larger than BEL's (and therefore type-II errors are bigger for 
ReBEL). Mean and trunc perform well for n = 1000 but show some limits for n = 5000 
and 10000. It seems that ReBEL is the only method converging to the nominal level 95%. 

Coverage probabilities at other nominal level can also be investigated, and we make 
a Monte-Carlo experiment (10 000 repetitions) in order to confirm the adequacy to the 
asymptotic distribution achieved by the ReBEL algorithm. Data sets length are 10 000. 

Figure EH] should be approximately here 

Figure IA3I shows the adequacy of the log likelihood to the asymptotic distribution given 
by Theorem 14.11 The QQ-plots is almost linear and is close to the 45° line. 



6. Conclusion 

This paper propose an alternative point of view on dependent data sets and a cor- 
responding semi-parametric methodology. Random length blocks allow to adapt to the 
dependence structure of the data. We have shown that ReBEL enjoys desirable properties 
corresponding to that of optimal reference methods for strong-mixing series. Simulations 
indicate that our algorithm at least competes with Kitamura's BEL when both methods 
can be applied. 

This method seems to be a promising tool to handle dependent data when classical 
parametric models do not perform well, for example in presence of asymmetry and non 
normality of the innovations. 
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Appendix A. Proofs 

A.l. Lemmas for the atomic case 

Denote Yj = M{Bj,eo), Y = l//„ EjLi and define 

L L 
Si = l/lnY,M{Bj,eo)M{Bj,eQ)' = l/lnY^YjY; and S{^^ = {Sfj-K 

To demonstrate Theorem 13. H we need 2 technical lemmas. 

Lemma A.l: Assume thatEA[M(B,OQ)M(B,OQy] exists and is full-rank, with ordered 
eigenvalues > • • • > o"! > 0. Then, assuming HO{l,v) and H0{1), we have 

Si EA[M{B,eo)M{B,6oy]. 
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Therefore, for all u with \\u\\ = 1, 

(Tl + Oi,(l) < u'Sf^U < CTp + 0,^(1) 



A. 1.0.1. Proof:. The convergence of Sf is a LLN for the sum of a random 
nu mber of random variab les, and is a straightforward corollary of the Theorem 6 
of ( Teicher and Chow 19881 . chapter 5.2, page 131). 



Lemma A. 2: Assuming HO{l,u), H0{2) and Hl{2,m), we have 

max I |y,-| I = Oi,(n"^/^). 



A.l. 0.2. Proof:. By HI (2, m), 



.i=l 



< oo, 



and then, 



Ea[\\Yi 



E 



A 



i=l 



< OO. 



By Lemma A.l of Antoine et al.l ( 20071 ). the maximum of n i.i.d. real-valued random 
variables with finite variance is o(n^/^). Let Zn be the maximum of n independent copies 
of lll^ll, Zn is then such as Zn = Ouin^^"^)- As Z„ is smaller than n, maxi<j<i^ 



IS 



bounded by Zn and therefore, maxi<j<i^ 



Ou{n 



1/21 



A. 2. Proof of Theorem \37l\ 

The likelihood ratio statistic rn{Oo) is the supremum over AG MP of Ei=ilog(l + A'ri). 
The first order condition at the supremum A„ is then: 



7 = 1 ■' 



Multiplying by A„ and using 1/(1 + x) = 1 — x/(l + x), we have 



h^Mi^- J^) - 0, and then A'„F ^ ^ ^ 
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Now we may bound the denominators 1 + A^ Yj- by 1 + ||A„|| maxj and then 



^"^tt l + ^n^i " (l + l|An||max,-||y,||) 
Multiply both sides by the denominator, XnY{l + ||An|| maxj \ \Yj\\) > X'^Sf^Xn or 

A;F> A;s2A„-||A„||max||y,-||A;F. 

3 

Dividing by ||An|| and setting u = An/||An||, we have 



n'y > ||A„ 



.'c2 



U 



Slu - iasi^\\Yj\\u'Y 



(A2) 



Now we control the terms between the square brackets. First, by Lemma lA.H u' Sf u is 
bounded between cri+o^(l) and ap+Op{l). Second, bv Lemma lA.2l maxj ||yjH = Oui'n^^'^)- 
Third, the CLT applied to the Yj's gives Y = Oyin-^l'^). Then, inequality (|A2l) gives 

a(n-i/') > ||A„|| [n'S,2^z-o,(ni/2)a(n-i/2)J ^ ||A„||(^z'5;> + o,(l)), 

and ||A„|| is then O^in-^l'^). 

Using the first order condition (|Aip as well as the equality 1/(1 + x) = 1 — x + x^/(l + x), 
we get 



The last term is Oy{n ^/^) by Lemma A. 2 of Antoine et al. ( 200?! ) and then 
Now, developing the log up to the second order. 



2r„(^o) = 2 log(l + A^Fj) = 2Z„A:,y - /nA^S^ A„ + 2 J] r?,-, 

where the r/j are such that, for some positive B and with probability tending to 1, 
\r]j\ < B\X'„Yj\^. Since, by LemmaQ max^- ||Y,|| = Oy{n^/'^), 



J]||y,f <nmax||y,|| | - J] 



from which we find 



no,(nV2)0,(l) =o,(n3/2) 
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Finally, 



This concludes the proof of Theorem 13.11 



A. 3. Proof of Theorem \3IE 

We keep the notations of the previous subsection. Note that instead of E/i[Yj] = 0, we 



have 



n \l In 



because In ~ n/Eyi[T^]. The beginning of the proof is similar to that of Theorem l3.lt the 
misspecification is not significant at first order: Y remains 0^{n-^^'^). We obtain: 

2rn{9o) = lnY'S-^Y + o,{l). 



Y—5^y¥,A[TA\/ln is asymptotically Gaussian with variance Ka[M{B, 9o)M{B, Oq)'], which 
is the limit in probability of Sf'^. Therefore, 



A. 4. Proof of Theorem IWT^ 

In order to prove Theorem 13.31 we use a result established by Qin and Lawless ( 19941 ). 



Lemma A. 3 Qin & Lawless, 1994 Let Z, Zi, - ■ ■ ^Zn ^ F he i.i.d. observations in 
and consider a function ^ : M'^ x — >■ M'' such that Kp[g{Z,6Q)] = 0. Suppose that the 
following hypotheses hold: 

(1) ^F[9{Z,0Q)g' [Z^Oq)] is positive definite, 

(2) dg{z,6)/d9 is continuous and bounded in norm by an integrable function G{z) in 
a neighborhood V of 9q, 

(3) \\g{z,e)\\^ is bounded by G{z) on V, 

(4) the rank ofEF[dg{Z,eo)/d6] is p, 

(5) — — — is continuous and bounded by G(z) on V. 

Then, the maximum empirical likelihood estimator On is a consistent estimator and 
\/n{6n — Oq) is asymptotically normal with mean zero. 

Set 

Z = Si = (X.,(i)+i,...,X,,(2))g U^^" 

neN 
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and g{Z,9) = AI{Bi,6). Expectation under F is then replaced by E^. Theorem 13.31 is a 
straightforward application of the Lemma lA.3l as soon as the assumptions hold. 

By assumption, Ea[M(5, 6'o)M(S, 6*0)'] is of fuU rank. This implies (1). 

By H2(a), there is a neighborhood V of Oq and a function such that, for all i 
between r^i + 1 and Tyi(2), dm{Xi,0)/d6 is continuous on V and bounded in norm by 
N{Xi). dM{Bi,9)/d6 is then continuous as a sum of continuous functions and is bounded 
for ^ in F by L{Bi) = E[=i-^\i)+i N{Xi). Since N is such that [N{X)] < 00, we have 
by Kac's Theorem, 



E. 



ta(2) 

i=TA{l) + l 



/Ea[ta] = EA[LiBi)]/EA[TA] < 00. 



The bounding function L{Bi) is then integrable. This gives assumption (2). Assump- 
tion (5) is derived from H2(c) by the same arguments. 
By H2(d), \\m{Xi,e)f is bounded by N{Xi) for 6 in V, and then 



rA{2) rA{2) 
i=rA (1) + 1 i=rA (1) + 1 

Thus, \\M{Bi,e)f is also bounded by L{Bi) for 9 in V, and hypotheses (3) follows. 
By Kac's Theorem, 

EA[TA]'^EA[dM{Bi,eQ)/de] = E^[dm{Xi,eo)/dei 

which is supposed to be of full rank by H2(b). Thus EA[dM{Bi,6Q) / dO] is of full rank 

and this gives assumption (4) . This concludes the proof of Theorem 13. 3^ 

Und er the same hypotheses. Theorem 2 and Corollaries 4 and 5 of in and LawlessI 
( I994I ) hold. They give respectively our Theorems 13. 5^ [3^ and [3^ 



A. 5. Proof of Theorem 4-1 



Suppose that we know the real transition density p. The chain can then be split with the 
Nummelin technique as above. We get an atomic chain X. Let's denote by Bj the blocks 
obtained from this chain. The Theorem (j3.ip can then be applied to Yj = M{Bj,6Q). 

Unfortunately p is unknown and then we can not use the Yj. Instead, we have the 
vectors Yj = M{Bj,6o), built on approximatively regenerative blocks. To prove the 
Theorem 14.11 we essentially need to control the difference between the two statistics 

Y = J- Yl\"=i ™d Y = j- Yl\"=i ^j- This can be done by using Lemmas (5.2) and (5.3) 
Bertail and ClemenconI ((20061): under H0(S,4,zy), we get 



m 



n n 



(A3) 
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and under HI (5, 4, v, m) and H1(S', 2, m), 

1 ^ 1 



n n 



j=i i=i 
With some straightforward calculus, we have 



Y -Y 


n 

< — 


hly - —Y 


+ 








n n 







Y 



(A4) 



Since 



- n/EA[TA] 



equation (1A3[) gives 

_ ( n in - In 

Iyi \ In 

and 

/ 



0,{¥.a[ta\) (l + 0,{¥.A[rA\)OMlI^)) ' = 0,{Ea[ta]) 



1 



n 



^n 



n 



0,{EA[TA])0,ial/^) = OMM'' 



From this and equation ()A4p . we deduce: 



Y -Y 
Therefore 



< 0,{EA\TA\)0,(n-^a]l'') + OM]l'')Ou{n-^l'') = 0,{a]l^n-^l^). (A5) 



n 



i/2y ^ ^1/2^ ^ ^1/2 (y-y)= n'l^Y + OMJ 



Using this and the CLT for the 1^, we show that n^/^y is asymptotically Gaussian. 

The same kind of arguments give a control on the difference between empirical vari- 
ances. Consider 



By Lemma (5.3) of Bertail and Clemengon ( 20061 ) we have, under Hl(5, 4, i^, m) and 
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Hl(5,2,m) 



L q2 L q2 
n n '^L 



Ou{an), and then 



sf - Si 



< 



n 



n n " 



+ 



K\ 



OMn)^OMh = OviX)- (A6) 



The proof of Theorem (j3.ip is then also vahd for the approximated blocks Bj and 
reduce to the study of the square of a self-normalized sum based on the pseudo-blocks. 



We have rn(6'o) = sup^^gRp |Ei=ilog 
optimum value of A, we have 



1 + A'y,- 



}. Let A„ = -Sr^y + o^(n-i/2) the 



It. 

2fM = -2inKY - J](A;y,)2 + o,(l) = Ly'S^^Y + 0,(1). 



Using the controls given by equations (|A5p and (jA6p . we get 



2f„(0o) = [/„+a(nay2)]x 




x[5,;2+o,(l)]x 




+ 0.(1) 



Developing this product, the main term is InYS^ 2r„(0o) and all other terms are 



o,,(l), yielding 



2f„(0o) = InYS^'^Y + o,(l) xl 



Results (b), (c) and (d) can be derived from the atomic case by using the same argu- 
ments. 



January 12, 2013 



15:22 



Journal of Nonparametric Statistics main l5'0211 

REFERENCES 



27 



n 


ReBEL 


BEL 


250 


0.92 


0.82 


500 


0.94 


0.88 


1000 


0.94 


0.91 



Table Al. Cover rates of confidence intervals for the mean of an AR(1). Comparison of ReBEL and BEL 
for different data set lengths. Nominal level is 0.95 . 




100 200 300 400 500 600 700 BOO 900 

Figure Al. The plain curve is a chain of length 1000. The horizontal lines limit the small set. 
The 18 renewal times are marked by vertical lines. High values are marked by a dot. 





e = e^ 


6 = 00- 


h5/V^ 


6 = 60 + 10/ 


n 


ReBEL 


BEL 


mean 


trunc 


ReBEL 


BEL 


mean 


trunc 


ReBEL 


BEL 


mean 


trunc 


1000 


54 


55 


58 


58 


24 


13 


14 


20 


07 


02 


02 


06 


5000 


88 


67 


74 


74 


52 


27 


30 


29 


12 


01 


01 


02 


10000 


92 


70 


76 


77 


59 


31 


34 


33 


11 


02 


02 


02 



Table A2. Coverage probabilities and type-II errors (percent) under the null and two alternatives, for 
ReBEL against BEL, and 2 reference methods. Nominal level is 95% 
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Figure A2. The plain curve gives the ReBEL likelihood, whereas the dotted curve shows the 
BEL likelihood. The red horizontal line marks the 95% level and 6*0 is marked by a circle on 
that line. The mean CI is the magenta segment whereas the trunc CI is the larger dotted cyan 
segment. 



ReBEL vs chi2 BEL vs chi2 




Figure A3. QQ-plots of 10 000 Monte-Carlo repetitions of ReBEL statistic versus Xi quantiles. 
The solid reference line is the 45° line. The reference circles on that line mark the 50%, 90% and 
95% levels. Data set length is n = 10 000. 



